Call data

此次資料採用kaggle網站提供的google play store APP的資料, 先行去除無效或錯誤的資料後進行分析,主要分析應用程式的評分分布。

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
#data from https://www.kaggle.com/lava18/google-play-store-apps
mydata = read.csv("googleplaystore.csv")

Data Structure

在分析之前,先整理此次所採計的資料。 - 採計的應用程式共9244項,共33種類型,以Family家庭類型最多,其次為Game遊戲、再者為Tools工具類型。

summary(mydata)
##                                                  App      
##  ROBLOX                                            :   9  
##  CBS Sports App - Scores, News, Stats & Watch Live :   8  
##  Candy Crush Saga                                  :   7  
##  Duolingo: Learn Languages Free                    :   7  
##  ESPN                                              :   7  
##  Bleacher Report: sports news, scores, & highlights:   6  
##  (Other)                                           :9200  
##           Category        Rating         Reviews        
##  FAMILY       :1725   Min.   :1.000   Min.   :       1  
##  GAME         :1073   1st Qu.:4.000   1st Qu.:     182  
##  TOOLS        : 727   Median :4.300   Median :    5718  
##  PRODUCTIVITY : 350   Mean   :4.191   Mean   :  501118  
##  MEDICAL      : 343   3rd Qu.:4.500   3rd Qu.:   79312  
##  COMMUNICATION: 326   Max.   :5.000   Max.   :78158306  
##  (Other)      :4700                                     
##                  Size           Installs      Type          Price     
##  Varies with device:1607   1000000+ :1557   Free:8604   0      :8604  
##  12M               : 161   10000000+:1236   Paid: 640   $2.99  : 112  
##  14M               : 160   100000+  :1142               $0.99  : 107  
##  11M               : 159   10000+   : 999               $4.99  :  69  
##  13M               : 157   5000000+ : 733               $1.99  :  58  
##  15M               : 155   1000+    : 708               $3.99  :  58  
##  (Other)           :6845   (Other)  :2869               (Other): 236  
##          Content.Rating           Genres        Last.Updated 
##  Adults only 18+:   3   Tools        : 726   3-Aug-18 : 307  
##  Everyone       :7339   Entertainment: 526   2-Aug-18 : 274  
##  Everyone 10+   : 386   Education    : 468   1-Aug-18 : 269  
##  Mature 17+     : 446   Productivity : 350   31-Jul-18: 269  
##  Teen           :1069   Action       : 349   30-Jul-18: 197  
##  Unrated        :   1   Medical      : 343   25-Jul-18: 156  
##                         (Other)      :6482   (Other)  :7772  
##              Current.Ver               Android.Ver  
##  Varies with device:1390   4.1 and up        :2029  
##  1                 : 473   Varies with device:1294  
##  1.1               : 206   4.0.3 and up      :1222  
##  1.2               : 132   4.0 and up        :1118  
##  2                 : 128   4.4 and up        : 862  
##  1.3               : 118   2.3 and up        : 578  
##  (Other)           :6797   (Other)           :2141
plot_ly(mydata, x = ~Category, color = ~Category, type = "histogram")

Rating

my.plot3 <- ggplot(mydata, aes(x = Rating))
my.plot3 <- my.plot3 +
  geom_histogram(binwidth = 0.1, fill = "steelblue")
my.plot3

plot_ly(mydata, x = ~Rating, color = ~Category, type = "box")